Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Notes on Contemporary Table Recognition

Identifieur interne : 001037 ( Main/Exploration ); précédent : 001036; suivant : 001038

Notes on Contemporary Table Recognition

Auteurs : W. Embley [États-Unis] ; Daniel Lopresti [États-Unis] ; George Nagy (informaticien) [États-Unis]

Source :

RBID : ISTEX:FF306D1B3471A2356EE4C01FC02514B76A0CA69F

Abstract

Abstract: The shift of interest to web tables in HTML and PDF files, coupled with the incorporation of table analysis and conversion routines in commercial desktop document processing software, are likely to turn table recognition into more of a systems than an algorithmic issue. We illustrate the transition by some actual examples of web table conversion. We then suggest that the appropriate target format for table analysis, whether performed by conventional customized programs or by off-the-shelf software, is a representation based on the abstract table introduced by X. Wang in 1996. We show that the Wang model is adequate for some useful tasks that prove elusive for less explicit representations, and outline our plans to develop a semi-automated table processing system to demonstrate this approach. Screen-snaphots of a prototype tool to allow table mark-up in the style of Wang are also presented.

Url:
DOI: 10.1007/11669487_15


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Notes on Contemporary Table Recognition</title>
<author>
<name sortKey="Embley, W" sort="Embley, W" uniqKey="Embley W" first="W." last="Embley">W. Embley</name>
</author>
<author>
<name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
</author>
<author>
<name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<affiliation>
<country>États-Unis</country>
<placeName>
<settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:FF306D1B3471A2356EE4C01FC02514B76A0CA69F</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11669487_15</idno>
<idno type="url">https://api.istex.fr/document/FF306D1B3471A2356EE4C01FC02514B76A0CA69F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001288</idno>
<idno type="wicri:Area/Istex/Curation">001211</idno>
<idno type="wicri:Area/Istex/Checkpoint">000A10</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Embley W:notes:on:contemporary</idno>
<idno type="wicri:Area/Main/Merge">001054</idno>
<idno type="wicri:Area/Main/Curation">001037</idno>
<idno type="wicri:Area/Main/Exploration">001037</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Notes on Contemporary Table Recognition</title>
<author>
<name sortKey="Embley, W" sort="Embley, W" uniqKey="Embley W" first="W." last="Embley">W. Embley</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Utah</region>
</placeName>
<wicri:cityArea>Computer Science Department, Brigham Young University, 84602, Provo</wicri:cityArea>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Pennsylvanie</region>
</placeName>
<wicri:cityArea>Department of Computer Science and Engineering, Lehigh University, 18015, Bethlehem</wicri:cityArea>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author>
<name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<affiliation wicri:level="2">
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">État de New York</region>
</placeName>
<wicri:cityArea>Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, 12180, Troy</wicri:cityArea>
<placeName>
<settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">États-Unis</country>
<placeName>
<settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">FF306D1B3471A2356EE4C01FC02514B76A0CA69F</idno>
<idno type="DOI">10.1007/11669487_15</idno>
<idno type="ChapterID">15</idno>
<idno type="ChapterID">Chap15</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: The shift of interest to web tables in HTML and PDF files, coupled with the incorporation of table analysis and conversion routines in commercial desktop document processing software, are likely to turn table recognition into more of a systems than an algorithmic issue. We illustrate the transition by some actual examples of web table conversion. We then suggest that the appropriate target format for table analysis, whether performed by conventional customized programs or by off-the-shelf software, is a representation based on the abstract table introduced by X. Wang in 1996. We show that the Wang model is adequate for some useful tasks that prove elusive for less explicit representations, and outline our plans to develop a semi-automated table processing system to demonstrate this approach. Screen-snaphots of a prototype tool to allow table mark-up in the style of Wang are also presented.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Pennsylvanie</li>
<li>Utah</li>
<li>État de New York</li>
</region>
<settlement>
<li>Troy (New York</li>
</settlement>
<orgName>
<li>Institut polytechnique Rensselaer</li>
</orgName>
</list>
<tree>
<country name="États-Unis">
<region name="Utah">
<name sortKey="Embley, W" sort="Embley, W" uniqKey="Embley W" first="W." last="Embley">W. Embley</name>
</region>
<name sortKey="Embley, W" sort="Embley, W" uniqKey="Embley W" first="W." last="Embley">W. Embley</name>
<name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
<name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
<name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001037 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001037 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:FF306D1B3471A2356EE4C01FC02514B76A0CA69F
   |texte=   Notes on Contemporary Table Recognition
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024